Intrinsic dimension estimation of data by principal component analysis
نویسندگان
چکیده
Estimating intrinsic dimensionality of data is a classic problem in pattern recognition and statistics. Principal Component Analysis (PCA) is a powerful tool in discovering dimensionality of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate intrinsic dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover and finally giving the estimation result by checking up the data variance on all small neighborhood regions. The proposed method utilizes the whole data set to estimate its intrinsic dimension and is convenient for incremental learning. In addition, our new PCA procedure can filter out noise in data and converge to a stable estimation with the neighborhood region size increasing. Experiments on synthetic and real world data sets show effectiveness of the proposed method.
منابع مشابه
Intrinsic Dimension Estimation by Maximum Likelihood in Probabilistic PCA
A central issue in dimension reduction is choosing a sensible number of dimensions to be retained. This work demonstrates the asymptotic consistency of the maximum likelihood criterion for determining the intrinsic dimension of a dataset in a isotropic version of Probabilistic Principal Component Analysis (PPCA). Numerical experiments on simulated and real datasets show that the maximum likelih...
متن کاملEstimating the intrinsic dimensionality of hyperspectral images
Estimating the intrinsic dimensionality (ID) of an intrinsically low (d-) dimensional data set embedded in a high (n-) dimensional input space by conventional Principal Component Analysis (PCA) is computationally hard because PCA scales cubic (O(n)) with the input dimension [11]. Besides this computational drawback, global PCA will overestimate the ID if the data manifold is curved. In this pap...
متن کاملFeature Dimension Reduction of Multisensor Data Fusion using Principal Component Fuzzy Analysis
These days, the most important areas of research in many different applications, with different tools, are focused on how to get awareness. One of the serious applications is the awareness of the behavior and activities of patients. The importance is due to the need of ubiquitous medical care for individuals. That the doctor knows the patient's physical condition, sometimes is very important. O...
متن کاملForecasting Financial Time Series through Intrinsic Dimension Estimation and Non-Linear Data Projection
A crucial problem in non-linear time series forecasting is to determine its auto-regressive order, in particular when the prediction method is non-linear. We show in this paper that this problem is related to the fractal dimension of the time series, and suggest using the Curvilinear Component Analysis (CCA) to project the data in a non-linear way on a space of adequately chosen dimension, befo...
متن کاملIntrinsic Dimensionality Estimation in Visualizing Toxicity Data
Over the years, a number of dimensionality reduction techniques have been proposed and used in chemo informatics to perform nonlinear mappings. Nevertheless, data visualization techniques can be efficiently applied for dimensionality reduction mainly in a case if the data are not really high-dimensional and can be represented as a nonlinear low-dimensional manifold when it is possible to reduce...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1002.2050 شماره
صفحات -
تاریخ انتشار 2010